Target templates and the time course of distractor location learning

When searching for a shape target, colour distractors typically capture our attention. Capture is smaller when observers search for a fixed target that allows for a feature-specific target template compared to a varying shape singleton target. Capture is also reduced when observers learn to predict the likely distractor location. We investigated how the precision of the target template modulates distractor location learning in an additional singleton search task. As observers are less prone to capture with a feature-specific target, we assumed that distractor location learning is less beneficial and therefore less pronounced than with a mixed-feature target. Hierarchical Bayesian parameter estimation was used to fit fine-grained distractor location learning curves. A model-based analysis of the time course of distractor location learning revealed an effect on the asymptotic performance level: when searching for a fixed-feature target, the asymptotic distractor cost indicated smaller distractor interference than with a mixed-feature target. Although interference was reduced for distractors at the high-probability location in both tasks, asymptotic distractor suppression was less pronounced with fixed-feature compared to mixed-feature targets. We conclude that with a more precise target template less distractor location learning is required, likely because the distractor dimension is down-weighted and its salience signal reduced.

. Data and fits. Observed data (points) and predicted data as the mean of each participant's expected RT value at each level (means of the posterior predictive distributions of the RTs; solid lines) of the fixed-feature task and the mixed-feature task are shown in blue and green, respectively. The shaded areas depict the 95% confidence intervals over these participant scores. Response times are calculated relative to distractor-absent trials.

A3: Literature-based prior on m
For most hyperparameters we could chose vaguely informative priors that let the data govern the outcome of the estimates. However, for the parameter m, which models the maximum performance decrease caused by a distractor farthest from the location that can be effectively inhibited, the prior needed to be more informative. The reason for this is that the location with the maximum performance decrease is not necessarily contained in the display. The largest distance between the high-probability location and a distractor in our displays was 11.20 cm and, hypothetically, it could still receive some inhibition radiating from the high-probability location. If there was a distractor somewhat farther away from the high-probability location, it might receive less inhibition and therefore have a higher impact on performance. As can be seen in Figure S1 it is not obvious whether the data points converge to a constant level of impairment or if the curves would continue to increase. This ambiguity is also present for the model and hence informative hyperpriors are required to regularize the fitting. To obtain information about the maximum performance decrease that might be observed with an uninhibited distractor, we sampled the literature on attentional capture to find research articles that met our criteria (target is a shape singleton, distractor is a colour singleton, 6-10 stimuli on the visual search display, no spatial learning, 30-70% distractor-absent trials). This approach allowed us to get an estimate of the observed maximum capture. We calculated the mean of these values and their standard deviation of reported attentional capture values (distractor presentdistractor absent) in 14 research articles 5-18 (see Figure S2), which included a total of 36 experiments. We used these values as a hyperprior for the parameter mµ.
Note that many of the values that informed these priors had to be read off from bar charts in the cited papers and hence might lack precision. Nevertheless, the purpose of the described procedure was to obtain a rough ballpark estimate of plausible values that can be fed into the prior. For the hyperprior mσ (which models the dispersion in the population), we chose a uniform distribution from 0 to 20, which should be sufficiently vague to allow enough variability.

A7: Potential contribution to a priority map
If we interpret the suppression map as a factor that contributes to an overall priority map, the question arises how it interacts with, for instance, stimulus-driven activation. One possible interaction is illustrated in Figure S5. Figure S5 (label a) shows how stimulus-driven and goal-driven activations combine to an integrated priority map. The strength of activations at each location on this map determines how attention is deployed. In experiments like that of the present study, selecting the target or the distractor would be proportional to the ratio of the activations at the respective locations. In the beginning, the distractor activation is rather still strong, hence distractors capture attention frequently. In this example, the limited template precision leads to a low signal-to-noise ratio ( Figure S5, label b). Hence, non-target and distractor locations receive some goal-driven activation, and consequently, in early trials, the distractor is a strong competitor to the target for selection. A more precise target template can lead to a more distinct target activation in the priority map, rendering capture by the distractor less likely. Whenever the distractor captures attention, a feedback signal is sent to the suppression map ( Figure S5, label c), strengthening suppression of the respective location. This automatically means: less distractor capture (e.g., with more specific templates) leads to less frequent "deepening" of the suppression valley at the distractor location (i.e. less distractor location learning). Moreover, as the location gets more and more suppressed ( Figure S5, label d), the capture occurs less often, causing the decline to be fast in the beginning and then level off, in agreement with our exponential learning curve model. Overall, this process reflects the transition from initially stimulus-driven toward predominantly goal-driven attentional guidance (see Vecera et al. 19 ).
Supplementary Figure S5. In this hypothetical process, the priority map is a combination of stimulus-driven and goal-driven activations, and the suppression map. Stimulus-driven and goaldriven activations in this illustration would already be the combinations across feature dimensions 20 . The rows in this figure show some exemplary high-probability location trials. The labels (a) to (d) are explained in the text. The rightmost column of the figure was adopted from Vecera et al. 19 .

Supplementary Methods B: Learning curve model
The learning curve model describes how the RTs decrease over the course of the experiment and is described in the methods section of the paper.

Note:
The code for the model implementation can be found at https://github.com/AylinH/HanneEtAl2022_DisLocLearning. Moreover, the traces sampled from the model are available at https://osf.io/5wcex/ and can be assessed using ArviZ 2 if more details are needed than presented here.

B1: Model structure, priors, sampling parameters
Hyperpriors: Note: Due to the exponential transformation on the participant-level, these hyperpriors effectively parameterize log-normal group-level distributions.  with the best model (rank 0) to the worst model (rank 3) based on its leave-one-out cross-validation (Loo) score. pLOO is the estimated effective number of parameters, dLOO indicates the relative difference of the LOO score to the best model, dSE indicates its standard error, weight estimates the model's weight given if it were used in model averaging.

B4: Participant-level estimates
Search strategies are determined by various factors (e.g., Adam et al. 21 ;Irons & Leber 22 ), and some participants tend to use search strategies that are, from an optimal observer's point of view, not very efficient. Accordingly, some participants might have searched for a shape singleton in our fixed-feature search task rather than searching for the exact target feature. Leber and Egeth 23 found that participants stick with the search mode they experienced as being effective in previous search trials. For instance, participants who had performed singleton search for several trials continued to search for a singleton target even though the search arrays allowed searching for a specific target feature (see also Bacon & Egeth 24 ).
We looked at the participant-level estimates for each individual's learning curve parameters to examine whether there is a mix of searchers who apply feature search and searchers who apply singleton detection in the fixed-feature task. To look for evidence for subgroups with different strategies, we took several measures: We plotted the individual means for distractorabsent and present trials in the fixed-feature task to check for possible differences in the data patterns. Participants that use singleton search should show slower RTs and, as they are vulnerable to any singleton, larger capture effects when distractors were presented at lowprobable locations. No such pattern was visible in the data (see Figure S7a). Also, if there was a mixture of search modes, we assume that the variability is greater in the fixed-feature task than in the mixed-feature task. However, the standard deviation is larger in the mixed-feature task (SD = 57.3) than in the fixed-feature task (SD = 45.56).
Concerning our modeling results, with a mixture of search modes, we would also expect larger variability in the fixed-feature task's starting RT level, or if the different strategies only manifest over time, in the asymptotic RT levels. We plotted the RTs of the starting level in distractor-absent trials (see Figure S7b) and the asymptotic RTs (see Figure S7c) for each participant. If different search modes were used in the fixed-feature task, we would expect to see the estimated values scatter around two subgroups. One subgroup would be similarly slow as the participants in the mixed-singleton task and the other subgroup would tend to be faster. We see no such pattern (or any other conspicuity) in the starting level or asymptotic RTs. The participants in the fixed-feature task seem to come from one group that tends to be faster than the mixed-feature group.
Supplementary Figure S7. a) Search response times for the fixed-feature task. Box-whisker plots show mean response times as a function of distractor-absent trials (grey), and distractor-present trials (coloured), with the distractor presented at the high-probability location (light blue) and at the lowprobability locations (dark blue). b) Forest plots showing the starting RT level for each participant with the mode (circles), HPD (horizontal lines) for the fixed-feature task (blue, lower panel) and the mixed-feature task (green, upper panel). c) Forest plots showing the asymptotic RT levels for each participant; the logic of visualization is the same as in b). Best viewed digitally with magnification.